AD-A233  655 


Technical  Report  1408 

April  1991 

Iterated  Transform 
Image  Compression 


Y.  Fisher 
E.  W.  Jacobs 
R.  D.  Boss 


DTIC 


ELECT E 
APR  03  1991 


Approved  for  public  release;  distribution  Is  unlimited. 


91  4  05  060 


I 


NAVAL  OCEAN  SYSTEMS  CENTER 

San  Diego,  California  92152-5000 

J.  D.  FONTANA,  CAPT,  USN  H.  R.  TALKINGTON,  Acting 

Commander  Technical  Director 


ADMINISTRATIVE  INFORMATION 

This  work  was  conducted  under  the  sponsorship  of  NOSC  in-house  Independent 
Exploratory  Development  Program  (IED).  The  work  was  performed  during  fiscal  year  1990  and 
funded  under  program  element  0602936N.  The  work  was  performed  by  members  of  Code  633, 
Naval  Ocean  Systems  Center,  San  Diego,  CA  92152-5000. 


Released  by 
J.  C.  Hicks,  Head 
Research  Branch 


Under  authority  of 
R.  H.  Moore,  Head 
ASW  Technology  Division 


MA 


SUMMARY 


OBJECTIVES 

To  present  in  a  clear  manner  an  algorithm  based  on  iterated  transforms  that  can 
be  used  to  compress  grayscale  images.  Demonstrate  the  algorithm  and  present  results 
for  various  images. 

RESULTS  AND  CONCLUSIONS 


The  theoretical  framework  for  iterated  tr«.  nsform  image  compression  has  been 
generalized  to  include  noncontractive  transforms.  The  method  has  been  used  to 
encode  the  512  x  512  8-bpp  image  of  “Lena”  at  a  compression  of  15.9:1  with  the 
decode  image  having  a  root  mean  square  error  of  6.33  (32.1  dB).  Other  images  have 
been  encoded  at  various  encoding  conditions  with  the  resulting  compressions  ranging 
from  10:1  to  63:1.  It  was  shown  that  the  relaxation  of  the  contractivity  constraint 
can  lead  to  improvement  in  the  fidelity  of  decoded  images. 
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1.  INTRODUCTION 


Lately,  there  has  been  much  interest  in  the  use  of  fractals  to  generate  images. 
Barnsley  and  Sloan  have  popularized  the  converse  notion  of  encoding  images  using 
fractals,  see  Barnsley  (1988)  and  Barnsley  and  Sloan  (1988).  In  this  report,  we  present 
a  scheme  with  a  derivative  dependence  on  fractals,  which  allows  the  encoding  of 
monochrome  images  as  a  set  of  transformations.  This  leads  to  a  resulting  decrease  in 
the  memory  required  to  store  an  image.  The  reconstructed  image  is  an  approximation 
of  the  original  image,  with  a  tradeoff  between  compression  and  fidelity.  Although 
the  current  implementation  of  the  encoding  scheme  is  not  superior  to  other  lossy 
compression  techniques,  many  possible  improvements,  which  may  lead  to  superior 
performance,  have  yet  to  be  explored. 

Hutchinson  (1981)  introduced  the  theory  of  iterated  function  systems  (a  term 
coined  by  Barnsley)  to  model  self-similar  sets  (such  as  in  figure  3).  Demko,  Hodges, 
and  Naylor  (1988)  first  suggested  using  iterated  function  systems  to  model  complex 
objects  in  computer  graphics.  Barnsley,  Demko,  Elton,  Sloan  and  others  generalized 
the  concepts  and  suggested  the  use  of  fractals  to  model  “natural  scenes.”  In  his 
thesis,  Jacquin  (1989)  described  an  image-encoding  scheme  based  on  iterated  Markov 
operators  on  measure  spaces  and  used  it  to  encode  6-bit /pixel  (bpp)  monochrome 
images.  This  report  presents  a  reformulation  of  this  theory  in  a  simplified  and  clarified 
manner.  The  theory  has  also  been  generalized  to  include  noncontractive  transforms. 
Eliminating  the  contractivity  constraint  for  individual  transforms  has  not  previously 
been  considered,  and  it  is  shown  that  it  can  result  in  a  modest  improvement  in  image 
quality. 


2.  A  SIMPLE  EXAMPLE 


The  example  in  this  section  serves  as  a  simple  illustration  of  the  concepts  involved 
in  the  image-encoding  scheme  presented  later.  This  example  is  based  on  iterated 
function  systems  (IFS)  which  have  been  popularized  by  Barnsley  (1988).  Although 
the  scheme  presented  later  and  iterated  function  systems  have  several  features  in  com¬ 
mon,  the  reader  should  be  forewarned  that  they  are  not  the  same.  The  main  concept 
is  that  the  image  of  a  set  (a  Sierpinski  gasket,  in  this  case)  can  be  reconstructed 
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from  a  set  of  transformations  which  may  take  less  memory  to  store  than  the  original. 
As  much  mathematical  formalism  as  possible  has  been  excluded  from  this  section, 
leaving  this  for  section  3. 

Consider  the  three  transformations  shown  in  figure  1.  They  are 


For  any  set  S,  let 


W(S)  =  u  «*(S). 


Denote  the  n-fold  composition  of  W  with  itself  as  VFon.  Define  An  =  VF(An_i)  = 
VFon(Ao)  and  arbitrarily  choose  Aq  as  the  unit  square  with  lower  left  corner  at  the 


origin  (i.e.,  A0  =  {(i,j/)|0  <x<l,0<y<  1}).  Then  as  n  — ►  oo,  the  set  An 


converges  (in  a  sense  not  defined,  but  certainly  visually)  to  a  limit  set  Aoo-  In  fact, 
for  any  compact  set  S  C  R 2,  Won(S)  —*  Aoo  as  n  — *  oo.  Figure  2  shows  AX,A2,A3 


and  A4.  Figure  3  shows  the  limit  set  A <». 


Figure  1.  Three  affine  transformations  in  the  plane. 


That  all  compact  initial  sets  converge  under  iteration  to  Aoo  is  important — it 
means  that  the  set  Aoo  is  defined  by  the  w,  only.  It  is  not  difficult  to  see  why  this  is 
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Figure  2.  A\  =  W(Ao)  and  its  images  A?,  A3  and  A4. 


Figure  3.  Limit  set  ,4^  =  lim^oo  VF°"(i40). 


true:  The  w,  are  contractive  (in  this  case  they  halve  the  diameter  of  any  set  to  which 
they  are  applied).  Thus,  any  initial  Aq  will  shrink  to  a  point  in  the  limit  as  the  W{ 
are  repeatedly  applied. 

Each  Wi  is  determined  by  6  real  values,  so  that  for  this  example  18  floating  point 
numbers  are  required.  In  single  precision,  this  requires  72  bytes.  The  memory  required 
to  store  an  image  of  the  set  depends  on  the  resolution;  figure  3  requires  256  x  256  x  1 
bit  =  8192  bytes  of  memory.  The  resulting  compression  ratio  in  this  example  is  113.8. 

In  the  example,  the  image  of  the  Sierpinski  gasket  is  described  by  a  set  of  pixels, 
each  being  either  black  or  white.  It  is  inherently  difficult  to  find  an  IFS  which  will 
encode  an  arbitrary  set.  The  theory  of  IFS’s  has  been  extended  by  Barnsley  and 
Jacquin  (1988)  to  allow  transforms  to  operate  on  only  parts  of  the  set  rather  than 
the  entire  set,  in  a  method  they  call  recurrent  iterated  function  systems  (RIFS).  This 
extension  can  encode  a  larger  set  of  images  (Barnsley  and  Jacquin,  1988),  and  has 
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been  used  in  a  fully  automated  encoding  system  (Jacobs,  Boss,  and  Fisher,  1990). 
The  problem  addressed  in  this  report  is  the  encoding  of  general  monochrome  images 
(i.e.,  an  image  in  which  each  pixel  has  many  possible  gray  levels,  not  just  black  or 
white).  This  type  of  image  can  be  thought  of  as  a  three-dimensional  object,  each 
pixel  having  an  x,y  coordinate,  and  an  intensity  value  z.  To  apply  the  basic  concepts 
of  RIFS  to  a  three-dimensional  image,  one  need  only  generalize  the  transformations 
to  three  dimensions. 


3.  THE  MODEL 


3.1  INTRODUCTION 

This  section  contains  a  description  of  an  iterative  dynamical  system  (a  space  and 
a  map  from  the  space  to  itself)  W  :  F  — *  F  used  to  model  and  encode  images.  The 
space  F  is  a  space  of  images,  and  the  mapping  W  is  a  contraction.  This  ensures  that 
the  dynamical  system  has  the  most  boring  dynamics  possible  —  rapid  convergence 
to  a  fixed  point.  The  goal  is  to  construct  the  mapping  W  so  that  its  fixed  point  is 
“close”  to  any  given  image  that  is  to  be  encoded.  Decoding  then  consists  of  iterating 
the  mapping  W  from  any  initial  image  until  the  iterates  converge  to  the  fixed  point. 

Let  7  =  [0, 1]  and  Jn  be  the  n-fold  Cartesian  product  of  7  with  itself.  Let  F  be 
the  space  consisting  of  all  graphs  of  real  Lebesgue  measurable  functions  z  =  f(x,  y) 
with  (x,y,  f(x,y))  €  73.  Note  that  /  is  required  to  be  bounded.  A  point  in  F  can  be 
thought  of  as  an  abstract  image  of  infinite  resolution,  with  f(x,y)  representing  the 
gray  level  (with  0  being  black  and  1  being  white)  at  the  point  ( x,y )  in  the  image. 
Images  with  finite  resolutions  can  be  modeled  by  partitioning  7 2  with  a  rectilinear 
grid  and  either  insisting  that  /  be  constant  on  the  boxes  of  the  grid,  or  by  averaging 
/  over  each  box.  Color  images  can  be  encoded  as  graphs  of  functions  f  :  I2  —*  I3 
with  range  points  representing  the  color  model  of  choice,  for  example  RGB  values. 

Since  the  contractive  mapping  fixed  point  theorem  requires  a  complete  metric 
space,  a  carefully  defined  space  and  an  associated  metric  is  needed.  The  problem 
becomes  one  of  finding  a  metric  on  F.  Since  the  choice  of  metric  in  this  paper  serves 
mostly  ao  motivation,  the  metric  will  be  chosen  to  be  as  simple  as  possible: 

HI, 9)  =  sup  \f{x,y)~  g(x,y)\.  (1) 

x,yg/ 
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The  pair  (F,  6)  forms  a  complete  metric  space. 

Other  image  models  have  been  described,  notably  that  of  a  positive  measure  over 
a  Borel  field  (Jacquin,  1989).  The  argument  for  using  this  model  is  compelling:  at 
any  resolution  viewing  an  image  consists  of  sensing  the  flux  of  light  through  many 
small  areas,  for  example  the  light  that  falls  on  each  cone  or  rod  in  the  retina.  This 
flux  can  be  naturally  thought  of  as  the  measure  of  a  small  area  in  the  image.  We 
have  opted  not  to  adopt  this  model  for  the  following  reasons:  First,  the  image  model 
which  is  in  fact  manipulated,  is  not  naturally  a  measure,  but  a  function  (which  can  be 
thought  of  as  modeling  a  measure).  Second,  the  space  of  measures  is  more  difficult  to 
metrize  than  F.  In  any  case,  this  model  has  all  the  versatility  of  the  measure  model 
without  the  extraneous  mathematical  baggage. 

The  following  sections  describe  two  types  of  transformations  W  :  F  — ►  F  which 
will  be  used  to  encode  and  decode  images.  The  first,  called  z  contractions  are  simpler 
to  define  and  generate  than  the  second,  called  eventual  contractions.  Both  types 
of  transformations  will  define  contractive  maps,  enabling  the  use  of  the  contractive 
mapping  fixed  point  theorem  to  find  fixed  points  of  W  easily. 


3.2  Z-CONTRACTIVE  MAPPINGS 

The  precise  statement  of  the  mappings  used  requires  several  definitions. 

Let  7rz  :  I3  — »  72  be  the  projection  operator  defined  by  irt(x,y,z)  =  (x,y). 

A  map  w  :  R3  — ►  R3  is  said  to  be  z-contractive  with  z-contractivity  s  if  there 
exists  a  positive  real  number  s  <  1  such  that  for  all  r.y,  zl5z2  E  R, 

M®,  y,  z\)  -  «>(*, y,  *2)1  <  s\zi  -  z2 1,  (2) 

and  7TZ  o  w(x,  y,  z)  does  not  depend  on  z. 

Let  Di,...,Dn  be  subsets  of  7 2  and  iq,...,t?n  :  73  — ►  1 3  be  some  collection  of 
maps.  Define  Wi  as  the  restriction 


The  maps  w\ ,wn  are  said  to  tile  1 2  if  {for  all  /  E  F,  (J”=i  wi(f)  G  F}.  This  means 
the  following:  for  an  image  /  E  F,  each  D,-  defines  a  part  of  the  image  /  fl  (A  x  /) 
to  which  Wi  is  restricted.  When  it;,  is  applied  to  this  part,  the  result  must  be  a  graph 
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of  a  function  over  Ri  =  nz  o  te,(/).  It  is  also  required  that  I2  =  U"_,  /?,,  i.e.,  that  the 
union  U"_iU>i(/)  yield  a  graph  of  a  function  over  I2.  In  particular,  this  means  that 
the  Ri  are  disjoint.  See  figure  4.  The  map  W  is  defined  as 

W  =  U  tn,,  (3) 

1=1 

Note  that  W  :  F  — ►  F  is  well  defined  only  when  the  collection  W\, . . .  ,wn  tiles  I2. 


Jl0W.(f) 

Figure  4.  Parts  of  the  tiling  of  an  image. 


During  the  rest  of  the  discussion,  it  is  assumed  that  w^,. . .  ,wn  tile  I2  and  that  W  is 
defined  as  in  equation  3. 

If  there  exists  a  positive  s  <  1  such  that  for  any  /,  g  €  F, 

6(W(f),W(g))<s6(f,g),  (4) 

then  W  is  called  a  contraction  and  s  is  called  the  contractivity  of  W .  Note  that  W 
may  be  a  contraction  and  still  separate  points  in  the  x  and  y  directions. 

Claim  3. i  If  w\, . . . ,  wn  art  z  contractions,  then  W  is  a  contraction. 

Proof.  Let  be  the  z  contractivities  for  the  transforms  u>i,...,n.  For  f,g  £  F 

and  some  a,  1  >  s  >  max,=i . n{st}, 

>mf),  W(g))  =  8up{|H'(/(*,y))-»'(!,(*,  »))||(z,  >)€/’} 
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6(W(f),W(g))  = 


< 

< 

< 

< 


sup{l[w,(x,y,/(x,y))  -  Wi(x,y,g(x,y))]  ■  i|  | 
(x,y)  e  Di,  i  =  l,...,n} 

(where  k  is  the  unit  vector  in  the  z  direction) 
sup{s,|/(x,y)  —  g(x,y)\  \  i  =  1 
sup{^|/(x,y)  -y(x,y)j} 

6sup{l/(x,y)  -y(x,y)|} 
s6(f,g).M 


For  completeness,  the  following  theorem  is  included,  using  the  notation  of  this 
section. 

Contractive  Mapping  Fixed  Point  Theorem.  Let  F  be  a  complete  metric  space 
with  metric  6.  If  W  :  F  — +  F  is  a  contraction,  then  there  exists  a  unique  point 
g  €  F  such  that  g  =  W(g).  Moreover,  for  any  f  6  F,  the  fixed  point  is  the  limit 
g  =  linin—oo  Won(f). 

Following  Hutchinson’s  notation  (1981),  the  fixed  point  is  denoted  |VF|  =  g  = 
lim^  W°n(f).  Then 

|W|  =  inini)  =  Cmiw|).  <5> 

.=i 

The  transformation  W  is  said  to  encode  an  image  f  €  F  if  /  =  |W|.  Given  W ,  it 
is  easy  to  find  the  image  that  it  encodes — begin  with  any  image  /0  and  successively 
compute  W(fo),  W{W(fo)), . . .  until  the  images  converge  to  |VF|.  The  converse  is 
considerably  more  difficult:  given  an  image  /,  how  is  a  mapping  W  found  such  that 
|VF|  =  /?  There  is  no  general,  nontrivial  solution  to  this  problem.  Instead  an  image 
/'  €  F  can  be  found  such  that  6(f,f')  is  minimal  with  /'  =  |W|.  Equation  5  suggests 
how  this  might  be  possible.  Domains  Dx,...,Dn  are  sought  with  corresponding 
transformations  wx, . . .  ,wn  such  that 

WU)  =  0  «'■(/)-  (6) 

1=1 

This  equation  says:  cover  /  with  parts  of  itself;  the  parts  are  defined  by  the  D,  and 
the  way  those  parts  cover  /  is  determined  by  the  w,.  Equality  in  equation  6  would 
imply  that  /  =  \W\.  Since  one  cannot  hope  to  exactly  cover  /  with  parts  of  itself, 
the  optimal  solution  is  sought,  and  then  one  hopes  that  |VE|  and  /  will  not  look  too 
different,  i.e.,  that  6(\W\,f)  is  small.  The  following  observation  (Barnsley,  1988), 
known  as  the  Collage  Theorem,  gives  hope  that  this  can  be  done.  It  is  a  corollary  of 
the  contractive  mapping  fixed  point  theorem. 
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Corollary  3.1  (Collage  Theorem).  Let  W  :  F  —*  F  be  a  contraction  with  con- 
tractivily  s  and  let  f  £  F  be  an  image.  Then 

1—5 

The  problem  is  to  find  a  W  such  that  6(W(f),f)  is  minimized  and  such  that  s 
is  small.  In  that  case,  | will  be  close  (in  6)  to  /.  However,  as  will  be  shown  in 
section  5,  the  bound  in  the  corollary  is  not  very  good;  it  provides  motivation  only  and 
not  a  useful  bound  in  practice.  In  fact,  it  is  possible  to  generate  examples  in  which 
the  bound  in  the  corollary  is  arbitrarily  large  while  6(\W\,f)  is  bounded.  Empirical 
results  (see  table  2)  show  that  restricting  s  to  be  small  can  result  in  a  bound,  but  a 
less  accurate  reconstructed  image. 

It  is  always  possible  to  approximate  any  given  image  of  finite  resolution  /  to  within 
any  e  >  0.  This  can  be  done  by  simply  mapping  the  whole  image  onto  each  pixel, 
for  example.  However,  this  is  not  a  deep  point,  because  compression  is  sacrificed  in 
order  to  achieve  accuracy;  that  is,  a  large  number  of  maps  uq, . . . ,  wn  is  required  in 
order  to  have  S(f,W(f))  small. 

3.3  EVENTUALLY  CONTRACTIVE  MAPPINGS 

This  section  contains  a  description  of  a  more  general  class  of  transformations  used 
to  encode  images.  Those  initiated  to  IFS  theory  (from  which  the  example  in  section  2 
is  drawn)  may  find  it  surprising  that  when  the  transformations  tn,-  are  constructed,  it 
is  not  necessary  to  impose  any  contractivity  conditions  on  the  individual  transforms — 
not  in  the  x,  y,  or  z  axis.  In  fact,  for  all  sets  of  transformations  given  in  this  report 
(except  for  the  two  special  cases  given  in  table  2)  the  u?,’s  are  not  forced  to  be  z 
•  jntractive. 

A  map  W  :  F  — ♦  F  is  eventually  contractive  if  there  exists  a  positive  integer  m 
such  that  Wom  is  contractive.  The  exponent  m  is  called  the  exponent  of  eventual 
contractivity.  Note  that  for  any  set  of  transforms,  for  which  an  m  exists,  there  is  a 
minimum  value  of  m.  All  contractive  maps  are  eventually  contractive,  but  not  vice 
versa. 

The  following  is  a  generalization  of  Corollary  3.1.  As  before,  it  is  assumed  that 
toi, . . .  ,wn  tile  I2.  The  z- contractivity  of  w <  is  the  smallest  number  s  satisfying  equa¬ 
tion  4,  without  the  requirement  that  s  <  1.  Let 

•Smoi  =  max  {s  :  s  =  z-contractivity  of  tt>;}. 

i=l,...,n 
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Claim  3.2  For  f  €  F  and  W  :  F  — *  &  eventually  contractive  with  minimum  expo¬ 
nent  of  eventual- contractivity  m  and  eventual- contractivity  a  <  1, 


1  1  —  «m 

w\  j)  <  t— ~ — w/),/). 

1  (J  1  Smax 


Proof.  The  proof  follows  the  same  lines  as  the  proof  of  corollary  3.1  (Barnsley, 
1988).  Sine.  has  contractivity  cr,  Corollary  3.1  implies  that 
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It  remains  to  show  that  |W|  =  |H^om|.  This  follows  since  |iyom|  =  limn_0O  Won(g )  is 
independent  of  g.  I 

With  some  extra  notation,  it  is  possible  to  improve  this  estimate,  but  since  this 
claim  and  Corollary  3.1  provide  motivation,  not  serious  bounds,  it  serves  little  purpose 
to  do  so. 

A  brief  explanation  of  how  a  transformation  W  :  F  — ►  F  can  be  eventually 
contractive  but  not  z  contractive  is  in  order.  The  map  W  is  composed  of  a  union  cf 
maps  to,  operating  on  disjoint  parts  of  an  image.  If  any  of  the  to,  are  not  z  contractive, 
then  W  will  also  not  be  contractive.  The  iterated  transform  Wom  is  composed  of  a 
union  of  compositions  of  the  form 

w{,  owh  o  •  •  •  io,m. 

Since  the  contractivities  multiply  to  yield  the  contractivity  of  the  composition,  the 
compositions  may  be  contractive  if  each  contains  sufficiently  contractive  .  Thus  W 
will  be  eventually  contractive  if  it  contains  sufficient  “mixing”  so  that  the  contractive 
Wi  eventually  dominate  the  expansive  ones.  In  practice,  this  condition  is  relatively 
simple  to  check. 


4.  THE  IMPLEMENTATION 


This  section  describes  an  implementation  of  the  image  compression  algorithm 
described  in  the  previous  sections.  Many  features  of  this  implementation  are  similar 
to  that  of  Jacquin  (1989).  The  differences  will  be  examined  in  the  discussion  section. 

In  brief,  the  encoding  of  an  image  can  be  described  as  follows.  Recall  that  the 
Wi  are  maps  into  73  and  that  7^,77;  C  72.  The  Ri  will  be  called  ranges  and  the  D ; 
domains,  even  though  they  are  not  the  domains  and  ranges  of  the  W{.  Nevertheless, 
the  terminology  is  commonly  used. 

To  encode  an  image  /,  u>,  and  D.;  which  tile  7 2  must  be  found  such  that  W  (defined 
by  equation  3)  is  contractive  or  eventually  contractive.  Since  the  goal  is  to  limit  the 
memory  required  to  specify  W ,  72  is  partitioned  by  geometrically  simple  sets  72,  with 
U"=l R,  =  I2.  For  each  Ft,,  a  D,  C  7 2  and  tu,  :  D,  x  1  —*  I3  is  sought  such  that  w,(f) 
is  a s  6  close  to  /  fl  (Ri  X  I)  as  possible;  that  is, 

*(/n(R,x  /W/))  (7) 

is  minimized.  To  limit  the  memory  required  to  specify  Wi,  only  maps  of  the  form 

x  1  a,  6,  0  x  ]  [  e« 

Wi  y  =  c,  di  0  y  +  fi  (8) 

z  \  0  0  Si  z  _  °i 

are  considered,  where  Wi  is  restricted  to  Z),  x  I.  This  constrains  the  sets  Di  as 
well,  since  they  must  (after  projection  by  tt*)  map  onto  the  Ri.  In  fact,  as  will  be 
described  in  more  detail  in  the  following  paragraphs,  further  restrictions  are  placed 
upon  the  possible  values  for  the  coefficients  in  equation  8,  thus  resulting  in  a  compact 
specification  for  the  Wi. 

There  are  two  fairly  independent  considerations  for  implementing  the  encoding 
algorithm.  First,  the  set  D  of  all  possible  domains  from  which  the  Di  s  are  chosen 
to  encode  the  image  must  be  defined.  Similarly,  the  set  R  of  all  possible  ranges  from 
which  the  Ri  s  are  chosen  to  encode  the  image  must  be  defined.  Although,  in  general, 
R  and  D  can  be  chosen  to  be  any  collection  of  subsets  of  72,  in  this  implementation 
they  are  taken  to  be  collections  of  squares  only.  The  choice  of  R  and  D  limits  the 
amount  of  information  required  to  specify  the  geometry  of  the  sets  thereby  increasing 
the  resulting  compression.  This  choice  also  simplifies  many  of  the  computations.  On 
the  other  hand,  it  severely  restricts  the  encoding  process,  in  the  sense  that  it  becomes 
more  difficult  to  collage  an  image  by  transformations  of  parts  of  itself. 
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For  256  x  256  pixel  images  the  model  of  section  3  is  scaled  to  [0, 255]  x  [0, 255]  x 
[0,255].  (For  other  image  sizes,  appropriate  scaling  was  employed.)  The  set  R  is 
chosen  to  consist  of  4  x  4,  8  x  8,  16  x  16,  and  32  x  32  nonoverlapping  subsquares  of 
[0, 255]  x  [0, 255].  The  collection  D  consists  of  8  x  8,  16x16,  24  x  24,  32  x  32,  48  x  48, 
and  64  x  64  subsquares  with  sides  which  are  parallel  to  or  slanted  at  45  degree  angles 
from  the  natural  edges  of  the  image.  To  reduce  the  amount  of  information  required 
to  specify  a  particular  domain  square,  domain  squares  of  size  s  x  s  are  restricted  to 
be  centered  on  a  lattice  with  vertical  and  horizontal  spacing  of  s/2. 

Given  a  range  square  R ,  and  a  domain  square  Dt,  there  are  eight  possible  orienta¬ 
tions  for  Di  to  have  when  mapped  onto  R,  by  a  map  of  the  form  of  equation  8.  These 


Figure  5.  Eight  symmetries  of  the  square. 


orientations  are  shown  in  figure  5.  The  size  and  position  of  Ri  and  /},,  and  a  given 
orientation  define  the  coefficients  aj,6j,Cj,d;,e,,  and  /,  in  equation  8.  Insisting  that 
Wi  map  (the  graph  above)  Di  to  (a  graph  above)  Ri  while  minimizing  equation  7, 
determines  s,  and  o,.  In  this  way  w,  is  determined  uniquely  for  a  chosen  metric.  In 
the  examples  in  section  5,  the  root  mean  square  (rms)  error  (Srmt)  was  chosen  as 
the  metric.  This  choice  was  made  for  ease  of  implementation  and  the  fact  that  Srmt 
is  more  reflective  of  visual  accuracy  than  S.  However,  6rm,  is  not  an  ideal  measure 
of  image  fidelity,  and  it  may  be  that  among  the  various  Lp  norms,  there  is  a  better 
choice.  The  proofs  in  section  3  do  not  hold  for  the  root  mean  square  metric.  This 
does  not  raise  a  serious  objection  to  the  use  of  6Tms ,  since  when  the  iterates  of  a 
transformation  converge  in  S,  they  converge  in  6rm» • 

Once  the  choice  of  R  and  D  is  made,  the  encoding  problem  is  reduced  to  choosing  a 
good  set  {/£,  }  C  R,  and  the  corresponding  set  {A}  C  D,  such  that  good  compression 
and  an  accurate  encoding  of  the  image  results.  The  number  of  transformations  is 
exactly  the  number  of  Ri  s;  therefore,  the  compression  is  inversely  proportional  to 
the  number  of  Ri  s  used  to  tile  the  image.  To  take  advantage  of  local  “flatness”  in 
the  image  and  to  reduce  the  error  in  regions  of  high  variability,  a  recursive  quadtree 
partitioning  method  is  used  to  allow  the  range  squares  to  vary  in  size  depending  on 
the  local  conditions  in  the  image. 

The  choice  of  D.-’s  affects  the  accuracy  of  the  image,  and  the  method  used  to 
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find  the  D,’s  determines  how  much  computation  time  the  encoding  takes.  A  search 
through  all  of  D  would  clearly  result  in  the  choice  that  would  best  minimize  equa¬ 
tion  7,  but  for  applications  where  encoding  time  is  a  consideration,  such  a  search 
may  require  too  much  computation  time.  To  overcome  this  problem,  a  classification 
scheme  is  used  to  classify  all  possible  domains  in  advance.  The  current  range  square 
is  classified  using  the  same  scheme,  and  a  search  for  the  optimal  domain  square  in  the 
same  class  (or  similar  classes)  is  performed.  If  the  best  domain  square  and  its  corre¬ 
sponding  w  result  in  an  error  less  than  a  predetermined  tolerance,  they  are  stored  ar.d 
the  process  is  repeated  for  the  next  range  square.  If  the  predetermined  tolerance  is 
not  satisfied,  the  range  square  is  subdivided  into  four  equal  squares,  and  the  process 
is  repeated  until  the  tolerance  condition  is  satisfied,  or  a  range  square  of  the  minimum 
size  i«  reached. 

Initially,  the  range  squares  Ri  were  chosen  to  be  the  64  32  x  32  subsquares  in  an 
image.  The  minimum  size  allowed  for  the  range  squares  was  usually  4x4  (although, 
as  mentioned  below,  sometimes  this  was  increased  to  8  x  8).  For  each  range  square 
tested,  a  domain  square  with  side  lengths  greater  than  the  side  lengths  of  the  range 
square  was  sought,  such  that  the  condition  of  equation  7  was  minimized. 

The  domain  and  range  squares  were  classified  in  the  following  way.  Each  square 
was  divided  into  quadrants  which  were  ordered  from  the  brightest  average  intensity  to 
the  darkest  average  intensity.  A  symmetry  operation  consisting  of  rotations  and  flips 
was  applied  to  bring  the  brightest,  second  brightest,  and  third  brightest  quadrants 
into  one  of  the  three  canonical  positions  shown  in  figure  6.  Once  divided  into  these 


Figure  6.  Three  canonical  orientations  for  square  partitions. 


three  major  classes,  the  quadrants  of  each  square  were  ordered  from  most  edge-like 
to  least  edge-like.  This  ordering  results  in  4!  =  24  possible  symmetries,  for  a  total  of 
72  classes. 

The  motivation  for  the  classification  scheme  is  simple.  By  orienting  a  square  into 
one  of  the  canonical  positions,  a  symmetry  operation  is  determined  that  is  likely  to  be 
optimal  (in  the  sense  of  minimizing  equation  7).  When  searching  for  domains  during 
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the  encoding  process,  the  symmetry  operation  determined  by  the  classification  scheme 
is  used  (the  other  seven  possible  orientations  are  not  tested).  The  further  classification 
by  edge-like  character  is  an  attempt  to  maintain  edge  fidelity,  since  visual  perception 
of  image  quality  is  sensitive  to  edge  integrity. 

Before  discussing  the  actual  results,  it  is  necessary  to  describe  in  more  detail  the 
information  that  needs  to  be  stored  to  determine  the  transformations: 


•  Size  of  the  range  square  R\  (the  position  of  /£,•  can  be  implicitly  determined  by 
the  order  of  storage), 

•  Size,  position,  and  orientation  (i.e.,  0  or  45  degrees)  of  the  domain  square  £),, 

•  Symmetry  operation, 

•  Scale  factor  (s),  and 

•  Offset  (o). 

The  restriction  on  W  to  be  eventually  contractive,  places  no  a  priori  limitation 
on  the  2-scaling  s  and  the  z-offset  o.  However,  s  and  o  must  be  stored  using  some 
fixed  number  of  bits.  Therefore,  the  restriction  0.05  <  |s|  <  2  or  s  =  0  was  used  (this 
set  can  be  stored  to  within  6  percent  accuracy  by  7  bits).  The  z-offset  o  was  stored 
using  8  bits  and  could  take  the  value  of  every  fourth  integer  from  —512  and  508. 
When  the  remaining  data  were  stored  efficiently,  the  average  storage  requirement  per 
transformation  was  approximately  31  bits. 

To  decode  an  encoded  image,  any  initial  /o  €  F  is  chosen  and  the  map  W  :  F  — ►  F 
is  iterated  until  the  iterates  converge  to  a  fixed  point  |W|.  The  fixed  point  is  the 
decoded  image. 


5.  RESULTS 


This  section  presents  some  results  from  this  particular  implementation.  Table  1 
summarizes  the  results  for  the  figures  shown.  In  the  table,  “tfflW],/)”  is  the  rms 
error  per  pixel,  uNpn  is  the  average  number  of  domain  squares  searched  for  each  range 
square,  “Time”  is  the  cpu  seconds  required  for  the  encoding  on  a  Convex  C210,  “n”  is 
the  number  of  transformation  in  the  encoding,  “Size”  is  the  size  of  the  original  figure 
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(and  the  decoded  result),  and  “Comp”  is  the  resulting  compression  for  the  figures. 
The  values  of  the  time  are  shown  for  comparison  between  the  various  encodings.  The 
times  have  since  been  reduced  dramatically  by  optimizing  the  encoding  program. 


Table  1.  Results  for  the  figures  8,  10,  and  lid. 


Figure 

nd 

Time 

n 

Size 

Comp 

8a 

14.05 

3035 

1103 

277 

256 

8b 

8.73 

4270 

2797 

601 

256 

8c 

7.68 

11340 

5453 

1003 

256 

8d 

8.12 

1472 

738 

1057 

256 

15.8 

10 

6.33 

55184 

96485 

3949 

512 

15.9 

lid 

8.59 

14155 

9591 

1654 

256 

10.0 

Figure  7  is  the  original  256  x  256  8-bpp  image  of  a  dog.  Figure  8  shows  four  images 
reconstructed  from  encodings  of  figure  7.  In  figures  8a  and  8b,  the  lowest  level  of  the 
quadtree  partitioning  was  8x8  squares  (rather  than  4x4  squares).  Figures  8c  and  8d 
have  comparable  compression  and  demonstrate  the  effect  of  the  classification  scheme 
on  computation  time.  The  average  number  of  domain  squares  searched  per  range 
is  determined  by  the  number  of  classes  searched  when  encoding.  Searching  through 
more  classes  does  yield  a  better  encoding  but,  as  can  be  seen  from  table  1,  results  in 
longer  computation  time. 


Figure  7.  Original  figure  of  the  dog. 
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(C)  (d) 

Figure  8.  Four  reconstructed  images  of  the  image  of  the  dog.  The  compressions  are 
(a)  63.0:1,  (b)  28.2:1,  (c)  16.6:1,  and  (d)  15.8:1. 


Figure  9  shows  the  commonly  used  512  x  512  image  of  Lena.  Figure  10  shows  a 
reconstructed  Lena  with  compression  15.9:1  and  an  rms  error  per  pixel  of  6.33  (32.1 
dB  signal-to-noise  ratio).  This  image  has  also  been  compressed  at  35.9:1  with  an  rms 
error  per  pixel  of  9.22  (28.8  dB). 
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Figure  9.  Original  512  x  512  image  of  Lena. 


( 

| 


Figure  10.  Decoded  Lena  at  15.9:1  compression. 


6.  DISCUSSION 


For  illustrative  purposes,  the  decoding  process  for  the  encoded  image  Lena  is 
shown  in  figure  11.  This  encoding  was  performed  on  a  256  x  256  original  version  of  the 
image.  Figure  11a  is  the  initial  image  /.  The  grid  pattern  was  chosen  to  illustrate  the 
nature  of  the  mappings  in  the  subsequent  figures.  This  choice  is,  of  course,  arbitrary. 
Figure  lib  is  the  first  iterate  W(f);  figure  11c  is  the  second  iterate  W°2(f );  and 
figure  lid  is  the  tenth  iterate  Wol0(f)  «  [W|  which  approximates  the  fixed  point  of 
the  system  to  within  the  resolution  of  the  image.  For  all  of  the  test  images  shown, 
decoding  requires  a  comparable  number  of  iterations.  This  is  of  particular  interest 
because,  as  discussed  in  section  3,  the  Wi's  are  not  restricted  to  be  z  contractive.  The 
distribution  of  the  scale  factors  used  to  encode  the  image  of  Lena  (figure  lid)  is  shown 
in  figure  12.  It  illustrates  that  there  are  a  significant  number  of  transformations  that 
are  not  z  contractive.  Note  that  to  maintain  a  constant  accuracy,  the  possible  values 
of  the  scale  factor  that  are  larger  in  magnitude  are  sparser. 

Since  the  relaxation  of  z  contractivity  has  not  been  previously  used,  it  is  worthy 
of  further  examination.  Let  /  be  the  256  x  256  8-bpp  pixel  image  image  shown 
in  figure  13a.  Let  Ru . . . ,  R1024  be  the  1024  nonintersecting  8x8  subsquares  of 
[0, 255]  x  [0, 255].  Let  D  be  the  collection  of  all  16  x  16  subsquares  of  [0, 255]  x  [0, 255] 
whose  edges  coincide  with  the  edges  of  the  R+.  This  choice  of  R  and  D  is  simpler  than 
that  used  for  other  results  in  this  report.  For  each  find  the  best  D  6  D,  denoted  D,, 
and  a  W{  of  the  form  of  equation  8  such  that  irz(wi(f))  =  Ri  and  such  that  equation  7 
is  minimized.  In  this  example,  D,  and  W{  were  chosen  by  searching  through  all 
possible  choices  for  the  pair  that  minimizes  equation  8.  To  ensure  that  limm_oo  Wom 
exists,  the  in,  must  be  z  contractive  or  W  must  be  eventually  contractive.  In  this 
example,  the  scale  factor  was  not  restricted  to  be  between  ±2  as  in  other  results  in 
this  report.  Rather,  s  was  restricted  to  be  between  ±0.7  or  ±0.9  (two  z-contractive 
mappings),  or  s  had  no  restriction  (this  resulted  in  an  eventually  contractive  mapping 
with  maximum  “z  contractivity”  =  4.05).  Figure  13b  shows  the  reconstructed  image 
from  the  eventually  contractive  mapping. 

Table  2  summarizes  some  results  for  this  example.  In  the  table,  o  is  the  contrac¬ 
tivity  of  Wom,  m  is  an  exponent  of  eventual  contractivity,  n  is  the  number  of  maps  W(, 
and  6-bound  is  the  bound  on  6(|W|,  /)  given  by  claim  3.2  or  corollary  3.1.  Because 
6rm i  is  a  better  measure  of  the  visual  accuracy  of  an  encoding  than  6;  it  is  used  for 
6rmt (W(f),f)  and  6rmi(\W\,f).  The  bound  in  table  2  is  given  as  6  because  that  is 
the  metric  for  which  corollary  3.1  and  claim  3.2  are  valid. 
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(a)  An  initial  image  /.  (b)  The  first  iterate  W(f). 


(c)  The  second  iterate  Wo2(f).  (d)  The  10th  iterate  Wol0(f). 


Figure  11.  Decoding  process  for  Lena. 


Table  2.  Results  for  encodings  of  figure  13a  using  different  constraints  on  the 
allowable  scale  factors. 
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(a)  (b) 

Figure  13.  (a)  A  sample  image  of  Mara  and  (b)  the  fixed  point  for  the  encoded 
Mara. 
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The  first  two  entries  in  table  2  are  data  for  encodings  resulting  from  z-contractive 
maps.  The  third  entry  is  an  encoding  with  an  eventually  contractive  map.  The  table 
demonstrates  several  points.  The  most  relevant  point  is  that  the  eventually  contrac¬ 
tive  map  results  in  a  moderately  more  accurate  encoding.  It  is  interesting  to  note 
that  the  difference  between  S(W(f),f)  and  S(\W\,f)  for  the  eventually  contractive 
mapping  is  not  greater  than  this  distance  for  the  z-contractive  mappings.  The  table 
also  demonstrates  that,  even  though  restricting  the  z-contractivity  of  the  to;’s  (from 
0.9  to  0.7  in  the  z-contractive  mappings)  improves  the  bound  from  corollary  3.1,  it 
worsens  the  fidelity  of  the  reconstructed  image.  It  is  evident  that  the  bound  given 
by  claim  3.2  for  the  eventually  contractive  map  is  also  poor.  This  demonstrates  that 
the  bounds  in  corollary  3.1  and  claim  3.2  should  be  viewed  as  motivation  rather  than 
actual  bounds. 

While  the  use  of  noncontract ive  transforms  does  improve  image  fidelity,  it  is  also 
true  that  to  store  the  transforms  requires  more  bits  for  those  encodings  in  which 
the  transforms  are  not  limited  to  be  contractive.  In  the  quadtree  implementation 
it  is  not  necessarily  true  that  the  addition  of  bits  to  each  transform  must  result  in 
a  loss  of  compression.  This  is  because  the  additional  scale  factors  may  allow  for  a 
larger  number  of  bigger  ranges  to  be  covered  successfully.  This  issue  of  fidelity  vs. 
compression  is  still  under  investigation. 

Another  point  of  interest  is  the  distribution  of  distances  between  Ri  and  £>,  .  If  this 
distribution  indicates  that  a  disproportional  number  of  domain  squares  are  relatively 
near  to  their  range  squares,  then  it  might  be  advantageous  to  limit  the  set  D  to 
a  localized  area  around  each  Ri.  This  probability  distribution  measures  the  degree 
of  local  self-similarity  within  an  image.  Figure  14  shows  this  distribution  for  the 
encoded  image  of  Lena  (figure  lid).  The  solid  curve  is  the  theoretical  distribution 
of  the  distance  between  two  randomly  chosen  points  in  I1.  The  figure  shows  that 
there  is  no  significant  local  self  similarity;  the  slight  shift  between  the  theoretical  and 
experimental  distribution  is  due  in  part  to  the  distance  being  measured  between  small 
but  not  infinitesimal  squares.  The  distribution  is  typical  for  the  test  images  studied. 

It  has  already  been  pointed  out  that  the  relaxation  of  z  contractivity  distinguishes 
the  data  given  here  from  other  implementations  of  iterated  transform  image  encod¬ 
ing.  The  data  in  figures  12  and  14  point  out  other  major  differences  between  this 
implementation  and  that  of  Jacquin  (1989).  Both  the  number  of  possible  domains 
in  D  and  the  allowed  vadues  of  the  scale  factor  s  are  far  more  numerous  in  the  im¬ 
plementation  described  here.  As  a  result,  this  implementation  will  allow  (roughly) 
only  2/3  the  number  of  transformations  allowed  in  Jacquin’s  scheme  at  the  same 
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Figure  14.  Theoretical  distribution  for  random  domain-range  distance  and  the 
actual  distribution  for  a  typical  encoding. 


compression.  Generalizing  the  transformations  increases  the  storage  requirement  per 
transformation  but  also  increases  the  possibility  of  covering  larger  range  squares, 
thereby  reducing  the  total  number  of  transformations  needed  to  encode  an  image. 
Clearly,  ore  could  imagine  other  schemes  that  would  further  increase  (or  decrease) 
the  storage  requirement  per  transformation.  A  systematic  investigation  of  the  depen¬ 
dence  of  system  performance  on  this  requirement  has  yet  to  be  done.  It  should  be 
noted  that  the  performance  of  the  implementation  presented  here,  and  that  of  Jacquin 
are  quite  similar  (as  measured  by  the  rms  error  at  the  same  compression).  This  is  of 
interest  because  the  approach  of  the  two  implementations  are  quite  different,  Jacquin 
using  a  large  number  of  transformations  from  a  small  pool,  and  the  implementation 
here  using  relatively  few  transformations  from  a  large  pool. 

The  size  of  the  domain  pool  also  effects  the  time  necessary  to  search  through  the 
pool.  As  described  in  section  4,  the  current  implementation  makes  use  of  a  classifica¬ 
tion  scheme  to  reduce  the  number  of  domains  that  need  to  be  tested.  Although  the 
classification  scheme  presented  results  in  classifications  that  are  sufficient  to  obtain 
good  encodings  (and  is  the  best  of  several  tested),  the  data  in  table  1  for  figures  8c 
and  d  show  that  a  search  through  more  classes  results  in  some  minor  improvement. 
Clearly,  more  work  could  be  done  to  find  better  classification  schemes. 

It  should  also  be  noted  that  with  some  postprocessing,  such  its  smoothing  the 
image  at  the  boundary  of  the  range  squares,  it  is  possible  to  greatly  reduce  the  “box” 
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artifacts  arising  from  the  encoding  process.  Such  postprocessing  can  also  decrease  the 
total  rms  error  of  the  encoding.  No  postprocessing  on  any  of  the  images  presented  in 
this  paper  was  done. 

Finally,  an  encoded  image  can  be  decoded  at  a  larger  size.  Because  the  trans¬ 
formations  can  naturally  create  detail  at  all  scales,  such  an  enlarged  decoding  will 
not  appear  pixelized.  In  a  certain  sense,  the  resulting  compression  is  increased,  since 
now  a  larger  image  is  decoded  from  the  same  information.  The  automatic  creation 
of  detail  may  be  useful,  but  all  of  the  images  in  this  paper  were  decoded  at  the  same 
size  as  the  original. 
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